-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow #32106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137129 has finished for PR 32106 at commit
|
| // Allocate a new byte array, and copy the inputs one by one into it. | ||
| // The size of the new array is the size of all inputs, plus the separators. | ||
| final byte[] result = new byte[numInputBytes + (numInputs - 1) * separator.numBytes]; | ||
| int intNumInputBytes = Ints.checkedCast(numInputBytes + (numInputs - 1) * separator.numBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you use guava here instead of just checking numInputBytes > Int.MaxValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code just follow the concat that do the same check.
| final byte[] result = new byte[Ints.checkedCast(totalLength)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use Math.toIntExact, and replace Ints.checkedCast by toIntExact, please. I don't see any benefits of the Guava function over the standard function, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced it.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137187 has finished for PR 32106 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137196 has finished for PR 32106 at commit
|
| int intNumInputBytes = Math.toIntExact(numInputBytes + (numInputs - 1) * separator.numBytes); | ||
| final byte[] result = new byte[intNumInputBytes]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to prevent overflow, we should do that for all operators:
| int intNumInputBytes = Math.toIntExact(numInputBytes + (numInputs - 1) * separator.numBytes); | |
| final byte[] result = new byte[intNumInputBytes]; | |
| int resultSize = Math.toIntExact(Math.addExact( | |
| numInputBytes, | |
| Math.multiplyExact(numInputs - 1, separator.numBytes))); | |
| final byte[] result = new byte[resultSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numInputBytes has changed to long. you mean (numInputs - 1) * separator.numBytes can overflow ? yeah, maybe it can be. How about this ?
int intNumInputBytes = Math.toIntExact(
numInputBytes + (numInputs - 1) * (long)separator.numBytes);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean (numInputs - 1) * separator.numBytes can overflow ?
yep
How about this ?
I am ok with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| // The size of the new array is the size of all inputs, plus the separators. | ||
| final byte[] result = new byte[numInputBytes + (numInputs - 1) * separator.numBytes]; | ||
| int intNumInputBytes = Math.toIntExact( | ||
| numInputBytes + (numInputs - 1) * (long)separator.numBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you re-check the indentation. I guess it should be smaller.
| int intNumInputBytes = Math.toIntExact( | ||
| numInputBytes + (numInputs - 1) * (long)separator.numBytes); | ||
| final byte[] result = new byte[intNumInputBytes]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, you can put this in one line:
| int intNumInputBytes = Math.toIntExact( | |
| numInputBytes + (numInputs - 1) * (long)separator.numBytes); | |
| final byte[] result = new byte[intNumInputBytes]; | |
| int resultSize = Math.toIntExact(numInputBytes + (numInputs - 1) * (long)separator.numBytes); | |
| final byte[] result = new byte[resultSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137203 has finished for PR 32106 at commit
|
|
Test build #137206 has finished for PR 32106 at commit
|
|
+1, LGTM. Merging to master. |
|
late lgtm |
|
@ulysses-you Would you like to review other places in UTF8String and make similar changes. For instance:
|
|
@MaxGekk created ticket SPARK-35041 |
|
created #32142 |
What changes were proposed in this pull request?
Add check if the byte length over
int.Why are the changes needed?
We encounter a very extreme case with expression
concat_ws, and the error msg isSeems the
UTF8String.concathas already done the length check at #21064, so it's better to add inconcatWs.Does this PR introduce any user-facing change?
Yes
How was this patch tested?
It's too heavy to add the test.